Authors of R packages to support Apache Spark, TensorFlow and MLflow. Contributors to tidyverse and Apache Arrow.
“Apache Spark™ is a unified analytics engine for large-scale data processing.”
Information grows at exponential rates.
We see Spark supporting multiple projects: TensorFlow, MLFlow, Modeling, etc.
The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
In an ideal world, all R packages work with Spark, like magic. Such is the case for dplyr and sparklyr.
library(sparklyr)
library(nycflights13)
sc <- spark_connect(master = "local|yarn|mesos|spark|livy")
flights <- copy_to(sc, flights)